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Abstract. We construct the Google matrix of the entire Twitter network, dated by July 2009, and analyze 
its spectrum and eigenstate properties including the PageRank and CheiRank vectors and 2DRanking of 
all nodes. Our studies show much stronger inter-connectivity between top PageRank nodes for the Twitter 
network compared to the networks of Wikipedia and British Universities studied previously. Our analysis 
allows to locate the top Twitter users which control the information flow on the network. We argue that 
this small fraction of the whole number of users, which can be viewed as the social network elite, plays the 
dominant role in the process of opinion formation on the network. 

PACS. 05. 10. -a Computational methods in statistical physics and nonlinear dynamics - 89. 20. -a Interdis- 
ciplinary applications of physics - 89.75.Fb.Fb Structures and organization in complex systems 
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1 Introduction 

Twitter is an online directed social network that enables 
its users to exchange short communications of up to 140 
characters [1 . In March 2012 this network had around 140 
million active users [1 . Being founded in 2006, the size 
of this network demonstrates an enormously fast growth 
with 41 million users in July 2009 only three years af- 
ter its creation. The crawling and statistical analysis of the 
entire Twitter network, collected in July 2009, was done by 
the KAIST group [2 with additional statistical character- 
istics available at LAW DSI of Milano University [3 . This 
network has scale-free properties with an average power 
law distribution of ingoing and outgoing links [2 ,3 being 
typical for the World Wide Web (WWW), Wikipedia and 
other social networks (see e.g [4 , [5 , [6 ). In this work we 
use this Twitter dataset to construct the Google matrix 
[7l[8^ of this directed network and we analyze the spec- 
tral properties of its eigenvalues and eigenvectors. Even 
if the entire size of Twitter 2009 is very large the power- 
ful Arnoldi method (see e.g. [9 , [lO], [11], |T2l) allows to 
obtain the spectrum and eigenstates for the largest eigen- 
values. 

A special analysis is performed for the PageRank vec- 
tor, used in the Google search engine [lllH], and the Chei- 
Rank vector studied for the Linux Kernel network [TT, 
[H], Wikipedia articles network [6., world trade network 
[15] and other directed networks [16]. While the compo- 
nents of the PageRank vector are on average proportional 
to a number of ingoing links [T7], the components of the 
CheiRank vector are on average proportional to a number 
of outgoing links [6)[T3] that leads to a two-dimensional 
ranking of all network nodes [16]. Thus our studies allow 
to analyze the spectral properties of the entire Twitter 



network of an enormously large size which is by one-two 
orders of magnitude larger compared to previous studies 



The paper is organized as follows: the construction of 
the Google matrix and its global structure are described 
in Section 2; the properties of spectrum and eigenvectors 
of the Google matrix of Twitter are presented in Section 
3; properties of 2DRanking of Twitter network are ana- 
lyzed in Section 4 and the discussion of the results is given 
in Section 5. Detailed data and results of our statistical 
analysis of the Twitter matrix are presented at the web 
page [T8] . 



2 Google matrix construction 

The Google matrix of the Twitter network is constructed 
following the standard rules described in [71IH] • we consider 
the elements Aij of the adjacency matrix being equal to 
unity if a user (or node) j points to user i and zero oth- 
erwise. Then the Google matrix of the network with N 
users is given by 



G^ 



aSij + (1 - a)/N , 



(1) 



where the matrix S is obtained by normalizing to unity 
all columns of the adjacency matrix Aij with at least one 
non-zero element, and replacing columns with only zero 
elements, corresponding to the dangling nodes, by 1/A^. 
The damping factor a in the WWW context describes the 
probability {1 — a) to jump to any node for a random 
surfer. The value a = 0.85 gives a good classification for 
WWW [8 and thus we also use this value here. The matrix 
G belongs to the class of Perron- Frobenius operators [8^ , 
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its largest eigenvalue is A = 1 and other eigenvalues have 
|A| < a. The right eigenvector at A = 1 gives the probabil- 
ity P(i) to find a random surfer at site i and is called the 
PageRank. Once the PageRank is found, all nodes can be 
sorted by decreasing probabilities P{i). The node rank is 
then given by index K{i) which reflects the relevance of 
the node i. The top PageRank nodes are located at small 
values of K{i) = 1, 2, .... 

The PageRank dependence on K is well described by 
a power law P{K) ex l/K^'"^ with f3in ~ 0.9. This is con- 
sistent with the relation /3in = l/{fJ^in — 1) correspond- 
ing to the average proportionality of PageRank proba- 
bility P{i) to its in-degree distribution Win{k) oc l/k^^"^ 
where k(i) is a number of ingoing links for a node i [8, 
[T7] . For the WWW it is established that for the ingoing 
links iiin ~ 2.1 (with Pin ~ 0.9) while for the out-degree 
distribution Wout of outgoing links the power law has the 
exponent jiout ^ 2.7 ^i5j. Similar values of these expo- 
nents are found for the WWW British university networks 
[12] , the procedure call network of Linux Kernel software 
introduced in [13] and for Wikipedia hyperlink citation 
network of English articles (see e.g. [6 ). 

In addition to the Google matrix G we also analyze 
the properties of matrix G* constructed from the network 
with inverted directions of links, with the adjacency ma- 
trix Ai^j Aj^i. After the inversion of links the Google 
matrix G* is constructed via the procedure ([1]) described 
above. The right eigenvector at unit eigenvalue of the ma- 
trix G* is called the CheiRank flTJE . In analogy with the 
PageRank the probability values of CheiRank are pro- 
portional to number of outgoing links, due to links in- 
version. All nodes of the network can be ordered in a 
decreasing order with the CheiRank index K^{i) with 
P* (X l/i^*^°-* with f3out = ^/{l^out - !)• Since each node 
i of the network is characterized both by PageRank K{i) 
and CheiRank K^{i) indexes the ranking of nodes be- 
comes two-dimensional. While PageRank highlights well- 
know popular nodes, CheiRank highlights communicative 
nodes. As discussed in [6 , 13 , 16 , such 2DRanking allows to 
characterized an information flow on networks in a more 
efficient and rich manner. It is convenient to character- 
ize the interdependence between PageRank and CheiRank 
vectors by the correlator 



N 



NY,P{K(i))P*{K*{i)) 



(2) 



As it is shown in [13l|T6], we have ^ {) ioi Linux Kernel 
network, transcription gene networks and « 2 — 4 for 
University and Wikipedia networks. 

In this work we apply the Google matrix analysis de- 
veloped in [6,12,13, 14iT5l[ig to the Twitter 2009 net- 
work available at [2l[3]. The total size of the Google ma- 
trix \s N = 41652230 and the number of links is = 
1468365182. This matrix size is by one- two orders of mag- 
nitude larger than those studied in [T2l|l4l[T6] . The num- 
ber of links per node is = N^/N '^'^b being by a factor 
1.5 — 3.5 larger than for Wikipedia network or Cambridge 
University 2006 network p^. The matrix elements of G 




Fig. 1. Google matrix of Twitter: matrix elements of G 
(left column) and G* (right column) are shown in the basis 
of PageRank index K (and K') of matrix Gkk' (left column 
panels) and in the basis of CheiRank index iC* (and iT* ) of 



matrix G* 



(right column panels). Here, x (and y) axis 



show K (and K ) (left column) (and respectively iT* and iC* 
on right column) with the range 1 < K, K' < 200 (top pan- 
els); 1<K,K' < 400 (middle panels); 1<K,K' <N (bottom 
panels) . All nodes are ordered by PageRank index K of the ma- 
trix G and thus we have two matrix indexes K' for matrix 
elements in this basis (left column) and respectively K* , K* 
for matrix G* (right column) . Bottom panels show the coarse- 
grained density of matrix elements Gk,k' and G*^^^^,] the 
coarse graining is done on 500 x 500 square cells for the en- 
tire Twitter network. We use a standard matrix representation 
with K — K' — 1 on top left panel corner (left column) and 
respectively K* — K* — 1 (right column). Color shows the 
amplitude of matrix elements in top and middle panels or their 
density in the bottom panels changing from blue for minimum 
zero value to red at maximum value. Here the damping factor 
is a = 1. 



and G* are shown in Fig. [T] on a scale of top 200 (top 
panels) and 400 (middle panels) values of K (for G) and 
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(for G*) and in a coarse grained image for the whole 
matrix size scale (bottom panels). 

It is interesting to note that the coarse-grained image 
has well visible hyperbolic onion curves of high density 
which are similar to those found in p!6l for Wikipedia and 
University networks. In [16 the appearance of such curves 
was attributed to existence of specific categories. We as- 
sume that for the Twitter network such curves are a result 
of enhanced links between various categories of users (e.g. 
actors, journalists etc.) but a detailed origin is still to be 
established. 

In the following sections we also compare the proper- 
ties of the Twitter network with those of the Wikipedia 
articles network from [6 . Some spectral properties of the 
Wikipedia network with N = 3282257 nodes and Ni = 
71012307 links are analyzed in [12^ 16 ^ . We also compare 
certain parameters with the networks of Cambridge and 
Oxford Universities of 2006 with N = 212710 and N = 
200823 nodes and with Ne = 2015265 and Ne = 1831542 
links respectively. The properties of these networks are 
discussed in [12(116]. The gallery of the Google matrix G 
images for these networks, as well as for the Linux Ker- 
nel network, are presented in [16^. The comparison with 
the data shown in Fig. [1] here shows that for the Twitter 
network we have much stronger interconnection matrix at 
moderate K values. We return to this point in Sections 
4,5. 



3 Spectrum and eigenstates of Twitter 

To obtain the spectrum of the Google matrix of Twitter 
we use the Arnoldi method plfimfTT]. However, at first, 
following the approach developed in [12], we determine 
the invariant subspaces of the Twitter network. For that 
for each node we find iteratively the set of nodes that can 
be reached by a chain of non-zero matrix elements of S. 
Usually, there are several such invariant isolated subsets 
and the size of such subsets is smaller than the whole 
matrix size. These subsets are invariant with respect to 
applications of matrix S. We merge all subspaces with 
common members, and obtain a sequence of disjoint sub- 
spaces Vj of dimension dj invariant by applications of S. 
The remaining part of nodes forms the wholly connected 
core space. Such a classification scheme can be efficiently 
implemented in a computer program, it provides a subdi- 
vision of network nodes in Nc core space nodes (typically 
70-80% of N for British University networks [12 ) and Ns 
subspace nodes belonging to at least one of the invariant 
subspaces Vj inducing the block triangular structure. 



high degeneracy. Its eigenvalues and eigenvectors are eas- 
ily accessible by numerical diagonalization (for full matri- 
ces) thus allowing to count the number of unit eigenvalues. 



Sec 



(3) 



Here the subspace-subspace block Sss is actually com- 
posed of many diagonal blocks for each of the invariant 
subspaces. Each of these blocks corresponds to a column 
sum normalized matrix of the same type as G and has 
therefore at least one unit eigenvalue thus explaining the 





Fig. 2. Spectrum of the Twitter matrix S {S* with inverted 
direction of links) for the Twitter network shown on left pan- 
els (right panels). Top panel: Subspace eigenvalues (blue dots) 
and core space eigenvalues (red dots) in A-plane (green curve 
shows unit circle); there are 17504 (66316) invariant subspaces, 
with maximal dimension 44 (2959) and the sum of all subspace 
dimensions is Ns = 40307 (180414). The core space eigenval- 
ues are obtained from the Arnoldi method applied to the core 
space subblock Sec of S with Arnoldi dimension 640 as ex- 
plained in Ref. [12 . Bottom panels: Fraction j/N of eigenval- 
ues with |A| > \Xj\ for the core space eigenvalues (red bottom 
curve) and all eigenvalues (blue top curve) from raw data of 
top panels. The number of eigenvalues with \Xj \ = 1 is 34135 
(129185) of which 17505 (66357) are at = 1; this number is 
(slightly) larger than the number of invariant subspaces which 
have each at least one unit eigenvalue. Note that in the bottom 
panels the number of eigenvalues with |Aj| = 1 is artificially 
reduced to 200 in order to have a better scale on the vertical 
axis. The correct number of those eigenvalues corresponds to 
j/N = 8.195 X 10"^ (3.102 x 10"^) which is strongly outside 
the vertical panel scale. 



We find for the G matrix of Twitter 2009 that there 
are Ns = 40307 subset sites with a maximal subspace 
dimension of 44 (most subspaces are of dimension 2 or 3). 
For the matrix G* we find Ng = 180414 also with a lot 
of subspaces of dimension 2 or 3 and a maximal subspace 
dimension of 2959. The remaining eigenvalues of S can be 
obtained from the projected core block Sec which is not 
column sum normalized (due to non-zero matrix elements 
in the block Ssc) and has therefore eigenvalues strictly 



inside the unit circle I A 



(core) 



< 1. We have applied the 



Arnoldi method (AM) pl fTOlfTT] with Arnoldi dimension 
ua = 640 to determine the largest eigenvalues of Sec which 
required a machine with 250 GB of physical RAM memory 
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to store the non-zero matrix elements of S and the 640 
vectors of the Krylov space. 

In general the Arnoldi methods provides numerically 
accurate values for the largest eigenvalues (in modulus) 
but their number depends crucially on the Arnoldi di- 
mension. In our case there is a considerable density of real 
eigenvalues close to the points 1 and — 1 where convergence 
is rather difficult. Comparing the results for different val- 
ues of n^, we find that for the matrix S (S*) the first 200 
(150) eigenvalues are correct within a relative error below 
0.3 % while the marjority of the remaining eigenvalues 
with |Aj| > 0.5 (|Aj| > 0.6) have a relative error of 10 %. 
However, the well isolated complex eigenvalues, well vis- 
ible in Fig. [21 converge much better and are numerically 
accurate (with an error ^ 10 ~^^). The first three core 
space eigenvalues of S (5*) are also numerically acurrate 
with an error of - lO"^"^ (- 10"^). 

The composed spectrum of subspaces and core space 
eigenvalues obtained by the Arnoldi method is shown in 
Fig. [2] for G and G*. The obtained results show that the 
fraction of invariant subspaces with A = 1 (^i = Ng/N ^ 
10~^) is by orders of magnitude smaller than the one found 
for British Universities (^i ^ 0.2 at A/" ^ 2 x 10^) [12 . We 
note that the cross and triple-star structures are visible 
for Twitter spectrum in Fig. [2] but they are significantly 
less pronounced as compared to the case of Cambridge 
and Oxford network spectrum (see Fig. 2 in ^12j). It is in- 
teresting that such a triplet and cross structures naturally 
appear in the spectra of random unistochastic matrices of 
size N = 3 and 4 which have been analyzed analytically 
and numerically in [19 . A similar star-structure spectrum 
appears also in sparse regular graphs with loops studied 
recently in [20] even if in the later case the spectrum goes 
outside of unit circle. This shows that even in large size 
networks the loop structure between 3 or 4 dominant types 
of nodes is well visible for University networks. For Twit- 
ter network it is less pronounced probably due to a larger 
number ^£ of links per node. At the same time a circle 
structure in the spectrum remains well visible both for 
Twitter and University networks. The integrated number 
of eigenvalues as a function of |A| is shown in the bot- 
tom panels of Fig. [2l Further detailed analysis is required 
for a better understanding of the origin of such spectral 
structures. 

It is interesting to note that a circular structure, formed 
by eigenvalues A^ with | A^ | being close to unity (see red and 
blue point in top left and right panels of Fig. [3|), is rather 
similar to those appearing in the Ulam networks of inter- 
mittency maps studied in [21 (see Fig. 4 there). Following 
an analogy with the dynamics of these one-dimensional 
maps we may say that the eigenstates related to such a 
circular structure corresponds to quasi-isolated communi- 
ties, being similar to orbits in a vicinity of intermittency 
region, where the information circulates mainly inside the 
community with only a very little flow outside of it. 

The eigenstates of G and G* with |A| being unity or 
close to unity are shown in Fig. [3l For the PageRank P 
(CheiRank P*) we compare its dependence on the corre- 
sponding index K (K*) with the PageRank (CheiRank) 




K, K , K- 



Fig. 3. The left (right) panel shows the PageRank P 
(CheiRank P*) versus the corresponding rank index K (K*) 
for the Google matrix of Twitter at the damping parame- 
ter a = 0.85 (thick black curve); for comparison the PageR- 
ank (CheiRank) of the Google matrix of Wikipedia network 
^ is shown by the gray curve at same a. The colored thin 
curves (shifted down by factor 1000 for clarity) show the 
modulus of four core space eigenvectors (|'0^*|) of S (S*) 
versus their own ranking indexes Ki (K*). Red and green 
lines are the eigenvectors corresponding to the two largest 
core space eigenvalues (in modulus) Ai = 0.99997358, A2 = 
0.99932634 (Ai = 0.99997002, A2 = 0.99994658); blue and 
pink lines are the eigenvectors corresponding to the two com- 
plex eigenvalues A151 = 0.09032572 + ^0.90000530, Aiei = 
-0.47504961 + z 0.76576321 (A457 = 0.38070896 + ^0.39207668, 
A105 = -0.45794117 + i 0.80825210). Eigenvalues and eigenvec- 
tors are obtained by the Arnoldi method with Arnoldi dimen- 
sion 640 as for the data in Fig. [2l 



of the Wikipedia network analyzed in [6l[T2l[T6] which size 
N (number of links Ni) is by a factor of 10 (20) smaller. 
Surprisingly we find that the PageRank P{K) of Twit- 
ter, approximated by the algebraic decay P{K) = a/K^, 
has a slower drop as compared to Wikipedia case. Indeed, 
we have /3 = 0.540 ± 0.004 (a = 0.00054 ± 0.00002) for 
the PageRank of Twitter in the range 1 < log^g K < 6 
(similar value as in [22 for the range log^^g ^ — ^•^) while 
we have /3 = 0.767 ± 0.0005 (a = 0.0086 ± 0.00035) for 
the same range of PageRank of Wikipedia network. Also 
we have a sharper drop of CheiRank with /3 = 0.857 ± 
0.003 (a = 0.0148 ± 0.0004) compared to those of PageR- 
ank of Twitter while for CheiRank of Wikipedia network 
we find an opposite tendency {/3 = 0.620 ± 0.001, a = 
0.0015 ±0.00002) in the same index range. Thus for Twit- 
ter network the PageRank is more delocalized compared 
to CheiRank (e.g. P(l) < P*(l)) while usually one has the 
opposite relation (e.g. for Wikipedia P(l) > P*(l)). We 
attribute this to the enormously high inter-connectivity 
between the top PageRank nodes K < 10"^ which is well 
visible in Fig. [H 

We should also point out a specific property of PageR- 
ank and CheiRank vectors which has been already noted 
in [23 : there are some degenerate plateaus in P{K{i)) or 
P*{K*{i)) with absolutely the same values of P or P* 
for a few nodes. For example, for the Twitter network 
we have the appearance of the first degenerate plateau at 
P = 7.639 X lO-'^ for 196489 < K < 196491. As a result 
the PageRank index K can be ordered in various ways. 
We attribute this phenomenon to the fact that the matrix 
elements of G are composed from rational elements that 
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leads to such type of degeneracy. However, the sizes of 
such degenerate plateaus are relatively short and they do 
not influence significantly the PageRank order. Indeed, 
on large scales the curves of P{K)^ P*{K*) are rather 
smooth being characterized by a finite slope (see Fig, [3j). 
Similar type of degenerate plateaus exits for networks of 
Wikipedia, Cambridge and Oxford Universities. 

Other eigenvectors of G and G* of Twitter network are 
shown by color curves in Fig. [3l We see that the shape of 
eigenstates with Ai and A2, shown as a function of their 
monotonic decrease index Ki^ is well pronounced in P{K). 
Indeed, these vectors have a rather small gap separating 
them from unity {\AX\ ~ 2 x 10~^) and thus they signif- 
icantly contribute to the PageRank at a = 0.85. At the 
same time we note that the gap values are significantly 
smaller than those for certain British Universities (see e.g. 
Fig. 4 in [12 ). We argue that a larger number of links ^£ for 
Twitter is at the origin of moderate spectral gap between 
the core space spectrum and A = 1. The eigenvectors of G* 
have less slope variations and their decay is rather similar 
to the decay of CheiRank vector P*{K*). 




Fig. 4. Fraction of invariant subspaces F with dimensions 
larger than d as a function of the rescaled variable x — d/{d), 
where (d) is the average subspace dimension. Left (right) panel 
corresponds to the matrix S (S*) for the Twitter network 
(thick red curve) with (d) = 2.30 (2.72). The tail can be fit- 
ted for X > 0.5 (x > 10) by the power law F{x) — ajx^ with 
a = 0.092 ±0.011 and h = 2.60 ±0.07 (a = 0.0125 ± 0.0008 and 
h = 0.94 ± 0.02). The thin black line is F{x) = (1 + 2x)"^-^ 
which corresponds to the universal behavior of F(x) found in 
Ref. [12] for the WWW of British university networks. 

Finally, in Fig. |4] we use the approach developed in 
p!2 and analyze the dependence of the fraction of in- 
variant subspaces F{x) with dimensions larger than d on 
the rescaled variable x = d/{d) where (d) is the average 
subspace dimension. In [12 it was found that the British 
University networks are characterized by a universal func- 
tional distribution F{x) = 1/(1 + 2x)^/^ For the Twitter 
network we find significant deviations from such a depen- 
dence as it is well seen in Fig. HJ The tail can be fitted by 
the power law F{x) ~ x~^ with the exponent b = 2.60 for 
G and b = 0.94 for G*. It seems that with the increase of 
number of links per node we start to see deviations from 
the above universal distribution: it is visible for Wikipedia 
network (see Fig. 7 in [12j) and becomes even more pro- 
nounced for the Twitter network. We assume that a large 
value of ^£ for Twitter leads to a change of the perco- 
lation properties of the network generating other type of 



distribution F which properties should be studied in more 
detail in further. 



4 CheiRank versus PageRank of Twitter 



As discussed in [T3'^'6',T6^ each network node i has its own 
PageRank index K{i) and CheiRank index K*{i) and, 
hence, the ranking of network nodes becomes a two-dimen- 
sional (2DRanking). The distribution of Twitter nodes 
in the PageRank- CheiRank plane {K^K*) is shown in 
Fig. [5] (left column) in comparison to the case of the Wiki- 
pedia network from [6l[T6] (right column) . There are much 
more nodes inside the square of size < 1000 for 

Twitter as compared to the case of Wikipedia. For the 
squares of larger sizes the densities become comparable. 
The global logarithmic density distribution is shown in the 
bottom panels of Fig. [5] for both networks. The two den- 
sities have certain similarities in their distributions: both 
have a maximal density along a certain ridge along a line 
Ini^* = In const. However, for the Twitter network 
we have a significantly larger number of nodes at small 
values < 1000 while in the Wikipedia network this 

area is practically empty. 

The striking difference between the Twitter and Wiki- 
pedia networks is in the number of points Nk^ located 
inside a square area of size K x K in the PageRank- 
CheiRank plane. This is directly illustrated in Fig. [6l at 
K = 500 there are 40 times more nodes for Twitter, at 
K = 1000 we have this ratio around 6. We note that a 
similar dependence Nk was studied in [16 for Wikipedia, 
British Universities and Linux Kernel networks (see Fig. 8 
there), where in all cases the initial growth of Nk was 
significantly smaller as compared to the Twitter network 
considered here. 

Another important characteristics of 2DRanking is the 
correlator hz ^ between PageRank and CheiRank vectors. 
We find for Twitter the value hi = 112.60 which is by a 
factor 30 - 60 larger compared to this value for Wikipedia 
(4.08), Cambridge and Oxford University networks of 2006 
considered in [6 ,12 ,16 . The origin of such a large value 
of for the Twitter network becomes more clear from the 
analysis of the distribution of individual node contribu- 
tions i^i = NP{K{i))P''{K''{i)) in the correlator sum ([2]) 
shown in Fig. [71 We see that there are certain nodes with 
very large i^i values and even if there are only few of them 
still they give a significant contribution to the total cor- 
relator value. We note that there is a similar feature for 
the Cambridge University network in 2011 as discussed 
in [16] even if there one finds a smaller value = 30. 
Thus we see that for certain nodes we have strongly cor- 
related large values of P{K{i)) and P'^{K^{i)) explaining 
the largest correlator value hi among all networks studied 
up to now. We will argue below that this is related to a 
very strong inter-connectivity between top K PageRank 
users of the Twitter network. 
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Fig. 6. Dependence of number of nodes Nk^ counted inside 
the square of size K x K on PageRank-CheiRank plane, on K 
for Twitter (blue curve) and Wikipedia (red curve); left panel 
shows data for 1 < K < 1000 in linear scale, right panel shows 
data in log-log scale for the whole range of K. 
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Fig. 5. Density of nodes W{K^K*) on PageRank-CheiRank 
plane (K,K*) for Twitter (left panels) and Wikipedia (right 
panels). Top panels show density in the range 1 < K,K* < 
1000 with averaging over cells of size 10 x 10; middle panels 
show the range 1 < K, K* < 10^ with averaging over cells of 
size 100 X 100; bottom panels show density averaged over 100 x 
100 logarithmically equidistant grids for < \nK, In K* < 
IniV, the density is averaged over all nodes inside each cell of 
the grid, the normalization condition is ^* W{K^ K*) = 1. 
Color varies from blue at zero value to red at maximal density 
value. At each panel the x-axis corresponds to K (or In K for 
the bottom panels) and the y-axis to K* (or \nK* for the 
bottom panels). 

5 Discussion 

In this work we study the statistical properties of the 
Google matrix of Twitter network including its spectrum, 
eigenstates and 2DRanking of PageRank and CheiRank 
vectors. The comparison with Wikipedia shows that for 
Twitter we have much stronger correlations between Page- 
Rank and GheiRank vectors. Thus for the Twitter network 
there are nodes which are very well known by the commu- 
nity of users and at the same time they are very com- 
municative being strongly connected with top PageRank 



Fig. 7. Histogram of frequency appearance of correlator com- 
ponents Ki = NP{K{i))P''{K''{i)) for networks of Twitter 
(blue) and Wikipedia (red). For the histogram the whole in- 
terval 10~^° ^ i^i ^ 10^ is divided in 240 cells of equal size in 
logarithmic scale. 



nodes. We attribute the origin of this phenomenon to a 
very strong connectivity between top K nodes for Twitter 
as compared to the Wikipedia network. This property is 
illustrated in Fig. [8] where we show the number of nonzero 
elements Nq of the Google matrix, taken at a = 1 and 
counted in the top left corner with indexes being smaller or 
equal to K (elements in columns of dangling nodes are not 
taken into account). We see that for K < 1000 we have for 
Twitter the 2D density of nonzero elements to be on a level 
of 70% while for Wikipedia this density is by a factor 10 
smaller. For these two networks the dependence of Nq on 
K dit K < 1000 is well described by a power law Nq = aN^ 
with a = 0.72 ± 0.01, b = 1.993 ± 0.002 for Twitter and 
a = 2.10 ±0.01, b = 1.469 ±0.001 for Wikipedia. Thus for 
Twitter the top K < 1000 elements fill about 70% of the 
matrix and about 20% for size K < 10^. For Wikipedia 
the filling factor is smaller by a factor 10 — 20. An effec- 
tive number of links per node for top K nodes is given 
by the ratio Nq/K which is equal to at = A/". The 
dependence of this ratio on K is shown in Fig. [8] in right 
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panel. We see a striking difference between Twitter net- 
work and networks of Wikipedia, Cambridge and Oxford 
Universities. For Twitter the maximum value of Nq/K 
is by two orders of magnitude larger as compared to the 
Universities networks, and by a factor 20 larger than for 
Wikipedia. Thus the Twitter network is characterized by 
a very strong connectivity between top PageRank nodes 
which can be considered as the Twitter elite i22i. 




Fig. 8. Left panel: dependence of the area density qk = 
Ng/K^ of nonzero elements of the adjacency matrix among top 
PageRank nodes on the PageRank index K for Twitter (blue 
curve) and Wikipedia (red curve) networks, data are shown in 
linear scale. Right panel: linear density Ng/K of same matrix 
elements shown for the whole range of K in log-log scale for 
Twitter (blue curve) , Wikipedia (red curve) , Oxford University 
2006 (magenta curve) and Cambridge University 2006 (green 
curve) (curves from top to bottom at i^^ = 100). 



It is interesting to note that for < 20 the Wikipedia 
network has a larger value of the ratio Nq/K'^ compared 
to the Twitter network, but the situation is changed for 
larger values of > 20. In fact the first top 20 nodes of 
Wikipedia network are mainly composed from world coun- 
tries (see [6]) which are strongly interconnected due to his- 
torical reasons. However, at larger values of K Wikipedia 
starts to have articles on various subjects and the ratio 
Nq/K'^ drops significantly. On the other hand, for the 
Twitter network we see that a large group of very impor- 
tant persons (VIP) with K < 10^ is strongly intercon- 
nected. This dominant VIP structure has certain similar- 
ities with the structure of transnational corporations and 
their ownership network dominated by a small tightly-knit 
core of financial institutions [24 . The existence of a solid 
phase of industrially devoloped, strongly linked countries 
is also established for the world trade network obtained 
from the United Nations COMTRADE data base [25j. It 
is possible that such super concentration of links between 
top Twitter users results from a global increase of num- 
ber of links per node characteristic for such type of social 
networks. Indeed, the recent analysis of the Facebook net- 
work shows a significant decrease of degree of separation 
during the time evolution of this network [26 . Also the 
number of friendship links per node reaches as high value 
as ^ 100 at the current Facebook snapshot studied 
in [26] (see Table 2 there). This significant growth of ^£ 
during the time evolution of social networks leads to an 
enormous concentration of links among society elite at top 
PageRank users and may significantly influence the pro- 
cess of strategic decisions on such networks in the future. 



The growth of leads also to a significant decrease of the 
exponent /3 of algebraic decay of PageRank which is known 
to be /3 ^ 0.9 for the WWW (see e.g. [4,5,8j) while for 
the Twitter network we find /3 ^ 0.5 (see also [22j). This 
tendency may be a precursor of a delocalization transition 
of the PageRank vector emerging at a large values of 
Such a delocalization would lead to a flat PageRank prob- 
ability distribution and a strong drop of the efficiency of 
the information retrieval process. It is known that for the 
Ulam networks of dynamical maps such a delocalization 
indeed takes place under certain conditions [21,27^. 

Our results show that the strong inter-connectivity of 
VIP users with about top 1000 PageRank indexes domi- 
nates the information flow on the network. This result is 
in line with the recent studies of opinion formation of the 
Twitter network [22j showing that the top 1300 PageRank 
users of Twitter can impose their opinion for the whole 
network of 41 million size. Thus we think that the statis- 
tical analysis presented here plays a very important role 
for a better understanding of decision making and opinion 
formation on the modern social networks. 

The present size of the Twitter network is by a fac- 
tor 3.5 larger as compared to its size in 2009 analyzed 
in this work. Thus it would be very interesting to extend 
the present analysis to the current status of the Twitter 
network which now includes all layers of the world soci- 
ety. Such an analysis will allow to understand in an better 
way the process of information flow and decision making 
on social networks. 
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